Untitled.ipynb
No Headings
The table of contents shows headings in notebooks and supported files.
- File
- Edit
- View
- Run
- Kernel
- Settings
- Help
Kernel status: Idle
[1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
[3]:
| age | sex | cp | trestbps | chol | fbs | restecg | thalach | exang | oldpeak | slope | ca | thal | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 52 | 1 | 0 | 125 | 212 | 0 | 1 | 168 | 0 | 1.0 | 2 | 2 | 3 | 0 |
| 1 | 53 | 1 | 0 | 140 | 203 | 1 | 0 | 155 | 1 | 3.1 | 0 | 0 | 3 | 0 |
| 2 | 70 | 1 | 0 | 145 | 174 | 0 | 1 | 125 | 1 | 2.6 | 0 | 0 | 3 | 0 |
| 3 | 61 | 1 | 0 | 148 | 203 | 0 | 1 | 161 | 0 | 0.0 | 2 | 1 | 3 | 0 |
| 4 | 62 | 0 | 0 | 138 | 294 | 1 | 1 | 106 | 0 | 1.9 | 1 | 3 | 2 | 0 |
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \ 0 52 1 0 125 212 0 1 168 0 1.0 2 1 53 1 0 140 203 1 0 155 1 3.1 0 2 70 1 0 145 174 0 1 125 1 2.6 0 3 61 1 0 148 203 0 1 161 0 0.0 2 4 62 0 0 138 294 1 1 106 0 1.9 1 ca thal target 0 2 3 0 1 0 3 0 2 0 3 0 3 1 3 0 4 3 2 0
[5]:
| age | sex | cp | trestbps | chol | fbs | restecg | thalach | exang | oldpeak | slope | ca | thal | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1020 | 59 | 1 | 1 | 140 | 221 | 0 | 1 | 164 | 1 | 0.0 | 2 | 0 | 2 | 1 |
| 1021 | 60 | 1 | 0 | 125 | 258 | 0 | 0 | 141 | 1 | 2.8 | 1 | 1 | 3 | 0 |
| 1022 | 47 | 1 | 0 | 110 | 275 | 0 | 0 | 118 | 1 | 1.0 | 1 | 1 | 2 | 0 |
| 1023 | 50 | 0 | 0 | 110 | 254 | 0 | 0 | 159 | 0 | 0.0 | 2 | 0 | 2 | 1 |
| 1024 | 54 | 1 | 0 | 120 | 188 | 0 | 1 | 113 | 0 | 1.4 | 1 | 1 | 3 | 0 |
age sex cp trestbps chol fbs restecg thalach exang oldpeak \
1020 59 1 1 140 221 0 1 164 1 0.0
1021 60 1 0 125 258 0 0 141 1 2.8
1022 47 1 0 110 275 0 0 118 1 1.0
1023 50 0 0 110 254 0 0 159 0 0.0
1024 54 1 0 120 188 0 1 113 0 1.4
slope ca thal target
1020 2 0 2 1
1021 1 1 3 0
1022 1 1 2 0
1023 2 0 2 1
1024 1 1 3 0
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1025 entries, 0 to 1024 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 age 1025 non-null int64 1 sex 1025 non-null int64 2 cp 1025 non-null int64 3 trestbps 1025 non-null int64 4 chol 1025 non-null int64 5 fbs 1025 non-null int64 6 restecg 1025 non-null int64 7 thalach 1025 non-null int64 8 exang 1025 non-null int64 9 oldpeak 1025 non-null float64 10 slope 1025 non-null int64 11 ca 1025 non-null int64 12 thal 1025 non-null int64 13 target 1025 non-null int64 dtypes: float64(1), int64(13) memory usage: 112.2 KB None
age sex cp trestbps chol \
count 1025.000000 1025.000000 1025.000000 1025.000000 1025.00000
mean 54.434146 0.695610 0.942439 131.611707 246.00000
std 9.072290 0.460373 1.029641 17.516718 51.59251
min 29.000000 0.000000 0.000000 94.000000 126.00000
25% 48.000000 0.000000 0.000000 120.000000 211.00000
50% 56.000000 1.000000 1.000000 130.000000 240.00000
75% 61.000000 1.000000 2.000000 140.000000 275.00000
max 77.000000 1.000000 3.000000 200.000000 564.00000
fbs restecg thalach exang oldpeak \
count 1025.000000 1025.000000 1025.000000 1025.000000 1025.000000
mean 0.149268 0.529756 149.114146 0.336585 1.071512
std 0.356527 0.527878 23.005724 0.472772 1.175053
min 0.000000 0.000000 71.000000 0.000000 0.000000
25% 0.000000 0.000000 132.000000 0.000000 0.000000
50% 0.000000 1.000000 152.000000 0.000000 0.800000
75% 0.000000 1.000000 166.000000 1.000000 1.800000
max 1.000000 2.000000 202.000000 1.000000 6.200000
slope ca thal target
count 1025.000000 1025.000000 1025.000000 1025.000000
mean 1.385366 0.754146 2.323902 0.513171
std 0.617755 1.030798 0.620660 0.500070
min 0.000000 0.000000 0.000000 0.000000
25% 1.000000 0.000000 2.000000 0.000000
50% 1.000000 0.000000 2.000000 1.000000
75% 2.000000 1.000000 3.000000 1.000000
max 2.000000 4.000000 3.000000 1.000000
Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
dtype='object')
[19]:
RandomForestClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(random_state=42)
[24]:
RFE(estimator=LogisticRegression(max_iter=5000, random_state=42),
n_features_to_select=10)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RFE(estimator=LogisticRegression(max_iter=5000, random_state=42),
n_features_to_select=10)LogisticRegression(max_iter=5000, random_state=42)
LogisticRegression(max_iter=5000, random_state=42)
[26]:
RFE(estimator=LogisticRegression(max_iter=5000, random_state=42),
n_features_to_select=10)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RFE(estimator=LogisticRegression(max_iter=5000, random_state=42),
n_features_to_select=10)LogisticRegression(max_iter=5000, random_state=42)
LogisticRegression(max_iter=5000, random_state=42)
[27]:
RFE(estimator=LogisticRegression(max_iter=5000, random_state=42, solver='saga'),
n_features_to_select=10)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RFE(estimator=LogisticRegression(max_iter=5000, random_state=42, solver='saga'),
n_features_to_select=10)LogisticRegression(max_iter=5000, random_state=42, solver='saga')
LogisticRegression(max_iter=5000, random_state=42, solver='saga')
Selected Features: Index(['age', 'sex', 'cp', 'exang', 'oldpeak', 'ca', 'thal',
'trestbps_chol_interaction', 'thalach_age_ratio', 'age_group_55-70'],
dtype='object')
Logistic Regression CV Accuracy: 0.8517
Best Random Forest Parameters: {'max_depth': 10, 'min_samples_split': 2, 'n_estimators': 100}
Random Forest CV Accuracy: 0.9941
SVM CV Accuracy: 0.8507
Best Model: Random Forest with Accuracy: 0.9941
[38]:
RandomForestClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(random_state=42)
[50]:
import seaborn as sns
# Create DataFrame for plotting
df_plot = pd.DataFrame(X_scaled, columns=X.columns)
df_plot['target'] = y
# Plot
sns.pairplot(df_plot, hue='target', diag_kind='kde')
plt.title('Pair Plot of Features')
plt.show()
[51]:
# Add engineered features to the DataFrame
df_plot['age_chol_interaction'] = X_scaled[:, X.columns.get_loc('age')] * X_scaled[:, X.columns.get_loc('chol')]
df_plot['chol_age_ratio'] = X_scaled[:, X.columns.get_loc('chol')] / (X_scaled[:, X.columns.get_loc('age')] + 1e-10)
# Plot histograms
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.histplot(df_plot['age_chol_interaction'], kde=True)
plt.title('Age-Cholesterol Interaction')
plt.subplot(1, 2, 2)
sns.histplot(df_plot['chol_age_ratio'], kde=True)
plt.title('Cholesterol to Age Ratio')
plt.show()
[ ]:
-
Variables
Callstack
Breakpoints
Source
9
1
Kernel Sources
Common Tools
No metadata.
Advanced Tools
No metadata.
